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Abstract. Persistent homology is one of the most active branches of 
Computational Algebraic Topology with applications in several contexts 
such as optical character recognition or analysis of point cloud data. In 
this paper, we report on the formal development of certified programs to 
compute persistent Betti numbers, an instrumental tool of persistent ho- 
mology, using the COQ proof assistant together with the SSReflect ex- 
tension. To this aim it has been necessary to formalize the underlying 
mathematical theory of these algorithms. This is another example show- 
ing that interactive theorem provers have reached a point where they 
are mature enough to tackle the formalization of nontrivial mathemati- 
cal theories. 
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1 Introduction 

Persistent homology is a branch of Algebraic Topology which appeared simulta- 
neously in three works during the last five years of the 20th century, see [10,18,37]. 
Since that time it has become one of the central tools in the context of Compu- 
tational Algebraic Topology and several applications and extensions have been 
developed. 

In a nutshell, persistent homology is a technique which allows one to study 
the lifetime of topological attributes; this can be really useful in different con- 
texts such as point cloud data [19], optical character recognition [31], sensor 
networks [9] and surface reconstruction from noisy samples [11]. 

In this work, the main notions about persistent homology have been for- 
malized using the COQ proof assistant [8] together with the SSReflect ex- 
tension [22]. During such a process we have proved relevant theorems like the 
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Fundamental Lemma of Persistent Homology [16, pp. 152]. Moreover, we have 
implemented certified programs to compute persistent Betti numbers, an instru- 
mental tool in the context of persistent homology. 

The rest of this paper is organized as follows. The next section is devoted 
to present the mathematical notions and results which will be formalized using 
Coq/SSReflect in Section 3. The effective algorithms to compute persistent 
homology and some experiments performed with such programs are introduced 
respectively in Section 4 and Section 5. The paper ends with a section of con- 
clusions and further work. 

The interested reader can consult the original and complete source code which 
can be found at [26]. 

2 Mathematical background 

In this section, we briefly provide the necessary mathematical background needed 
to understand the paper. We mainly focus on definitions, some of them are well 
known notions of Algebraic Topology, see [32], and the rest comes from persis- 
tent homology theory [17,39,16]. We start by presenting simplicial complexes, a 
combinatorial object which can be understood as a generalization of graphs to 
higher dimensions. 

2.1 Simplicial complexes 

Let V be an ordered set, called the vertex set. An (abstract) simplex over V 
is any finite subset of V. An (abstract) n-simplex over V is a simplex over V 
whose cardinality is equal to n + 1. Given a simplex a over V, we call faces to 
the subsets of a. 

Definition 1 An (ordered abstract) simplicial complex over V is a set of sim- 
pliccs JC over V such that it is closed by taking faces (subsets), that is, if a £ JC 
all the faces of a are in JC too. 

Let JC be a simplicial complex, the set S n {JC) of n-simplices of JC is the set 
made of the simplices of cardinality n + 1. 

Example 1. Let us consider V — (0,1,2,3,4,5). The small simplicial complex 
drawn in Figure 1 is mathematically defined as the object: 

f (0), (1), (2), (3), (4), (5), (0, 1), (0, 2), (1, 2), (2, 3) 
\ (3, 4), (3, 5), (4, 5), (0,1, 2) 

Definition 2 Let JC be a simplicial complex over V. Let n and i be two integers 
such that n > 1 and < i < n. Then the face operator df is the linear map 
<9™ : S n {JC) -> S„_i(/C) defined by: 

di((vo, . . . , v n )) = (vq, . . . ,Vi-i,Vi+i, ...,v„) 
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where the i-th vertex of the simplex is removed, so that a (n — l)-simplex is 
obtained. 
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Fig. 1. Diabolo Complex 



2.2 Chain complexes 

Now, we introduce a central notion in Algebraic Topology. Notions like rings, 
modules over a ring and module morphisms (see [30] for details) are assumed to 
be known. 

Definition 3 A chain complex C* is a pair of sequences (C n ,c? n )„ e z where for 
every n E Z, C„ is a 7?.-module (with 1Z a ring) and d n : C n — > C n -\ is a module 
morphism, called the differential map, such that the composition d n d n+ i is null 
(this is known as nilpotency condition). 

The module C n is called the module of n-chains. The image B n = im d n+ \ C 
C n is the (sub)modulc of n-boundaries. The kernel Z n = kcid n C C„ is the 
(sub)module of n-cycles. 

Once we have defined the notions of simplicial complexes and chain com- 
plexes, we can define the link between them considering Z 2 as the ground ring. 
As Z 2 is a field the chain groups are vector spaces. 

Definition 4 Let /C be a simplicial complex over V. Then the chain complex 
C»(/C) canonically associated with K, is defined as follows. The chain group 
C n (fC) is the free Z 2 -module generated by the n-simplices of /C. In addition, 
let (vq, . . . , v n ) be a n-simplex of /C, the differential of this simplex is defined as: 



C n (K) is a free module and the n-simplices form the standard basis of it. 
Therefore, for all n we can represent d n : C n {K) — > C n -\{K) relative to the 
standard basis of the chain groups as a Z 2 matrix. Such a matrix is called the 
n-th incidence matrix of a simplicial complex. 

Let us present an example in order to clarify the notion of chain complex 
canonically associated with a simplicial complex. 

Example 2. Let /C be the simplicial complex defined in Figure 1. The chain 
complex C*(fC) canonically associated with /C is: 
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where the 3 associated chain groups are: 



• Co (K.) , the free Z 2 -module on the set of O-simplices (vertices) 
{(0),(1),(2),(3),(4),(5)}. 

• Ci(/C), the free Z 2 -module on the set of 1-simplices (edges) 
{(0,1), (0,2), (1,2), (2,3), (3,4), (3,5), (4,5)}. 

• C 2 (K.) , the free Z 2 -module on the set of 2-simplices (triangles) 
{(0,1,2)}. 

and the first incidence matrix, di, is: 
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Finally, we introduce one of the most important notions in the context of 
Computational Algebraic Topology. Given a chain complex C* = (C n ,d n ) ne z, 
the identities <i„_i o d n = mean that the inclusion relations B n C Z n hold, 
that is, every boundary is a cycle (the converse is generally not true). Thus the 
next definition makes sense. 

Definition 5 The n-th homology group of C„, denoted by H n (C*), is defined as 
the quotient H n (C*) = Z n /B n . The elements of H n {C*) are called n- dimensional 
homology classes of C*. 

The n-th Betti number of C», denoted by /3 n (C*), is the rank of the n-th 
homology group of C». 

In an intuitive sense, the n-th Betti number of an object X measures the 
number of n holes of X; to be more concrete, /?o measures the number of con- 
nected components, and the Betti numbers j3 n , with n > 0, measure higher 
dimensional connectedness. 

The homology groups of a simplicial complex /C are the ones associated with 
the chain complex C*(/C). Moreover, Betti numbers of a simplicial complex can 
be easily computed considering the representation of the differential maps as 
matrices using the formula: 

/3„(C*(/C)) = ns — rank(d n ) — rank(d n+ i) (1) 
where ns is the number of n-simplices. 

2.3 Persistent Homology 

We end this section by introducing the instrumental notions in persistent homol- 
ogy theory. A more detailed description of this theory can be found in [17,39,16]. 



Definition 6 Let JC be a simplicial complex, a subcomplex of K, is a subset 
C C /C that is also a simplicial complex. A filtration of /C is a nested subsequence 
of complexes: 

^° C X 1 C . . . C K m = K 

An example of a filtration can be seen in Figure 2 taking the diabolo complex 
of Figure 1 as K,. 
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Fig. 2. Filtration of the diabolo simplicial complex 



Given a filtration of a simplicial complex and the j-th component of the 
filtration, let us say K 3 , we will denote C n {K 3 ), Z n (K 3 ) and B n (K 3 ) by C&, Z£ 
and respectively. Therefore, we can represent the chain complexes associated 
with a filtration using the following diagram. 




where V n is the map induced by the inclusion between the n-simplices of K 3 and 
the ones of K 3+1 . Moreover, for j < p we will denote i 3 ^ 9 to the map induced by 
the inclusion between the n-simplices of K 3 and the ones of K p . Now, we can 
introduce the notion of persistent homology groups. 

Definition 7 The p-persistent n-th homology group of K 3 , denoted by H^p, is 
defined as the quotient = (Z 3 n ) / (BP n i J rt P (Z 3 n )). 



The p-persistent n-th Betti number of , denoted by fy p , is denned as the 
rank of H^ p . 

The elements of H^ p are the n-dimensional homology classes of Ri which 
are still alive at K p . Hence f3^ p measures the number of n-dimensional classes 
of Ri which are still alive at K' p . If we are interested in computing the n- 
dimensional homology classes which are born at and die entering K p , we 
have the following formula: 

V 3 n P = (Z^- 1 " Pn P ) ~ Wt 1 ' P - 1 - fit 1 *)- (2) 

The first difference on the right hand side of the above formula measures the 
number of n-dimensional classes which are born at or before K J and die entering 
K p , and the second one counts the number of n-dimensional classes which are 
born at or before K^~ x and die entering K v . Now, we can state the Fundamental 
Lemma of Persistent Homology. 

Theorem 8 (Fundamental Lemma of Persistent Homology) Let K° C 

K 1 C . . . C K m — K. be a filtration. For every pair of indices < k < I < m and 
every dimension n, the Z-persistent n-th Betti number of K is 

0<i<k Kj<m 

This version of the Fundamental Lemma of Persistent Homology is stated 
slightly different from the one presented in [16, pp.152]; however both of them 
are equivalent. There are two differences between the statement of Edclsbrunner 
and Harer [16, pp.152] and the one presented in Theorem 8. The former is related 
to the definition of u£ p which is defined for the case peN and has to be extended 
for p = oo. In particular for such a case, we have the following formula: 

, — _ /9J-l,oo 

P-n rn rn 

where f3^°° is defined as the n-dimensional classes which are born at or before 
Ri and never die (or die in the oo). It is worth noting that /%°° is equal to fy m 
(where m is the last level of the filtration) . 

The latter difference is a consequence of the former one and is related to 
the inner sum of the theorem which will be an infinite sum. In particular, the 
Edclsbrunner and Harer's Fundamental Lemma of Persistent Homology is stated 
as follows. 

Theorem 9 (Fundamental Lemma of Persistent Homology) [16, pp.152] 
Let K° C K 1 C . . . C K' m = IC be a filtration. For every pair of indices 
< k < I < m and every dimension n, the Z-persistent n-th Betti number of K k 
is 

E Etf ■ 

0<i<fe Kj 



As we have said previously, both formulations of the theorems are equivalent; 
however, the one presented in Theorem 8 is more suitable to be formalized since 
we do not need to handle infinite sums in the Coq/SS Reflect theorem prover. 

Finally, in order to shed light on the meaning of persistent homology, we in- 
troduce the usual way of visualizing persistence. The lifetime of a n-dimensional 
homology class can be represented as an interval; namely, a homology class which 
is born at level K l of the filtration and dies entering (with i < j) is repre- 
sented as the interval and if it is born at level K l but never dies we use 
the interval [i, oo). A barcode is defined to be the set of resulting intervals of 
a filtration. For example, the barcodes in degree and 1 associated with the 
filtration of Figure 2 are the ones depicted in Figure 3. 
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Fig. 3. Barcodes of the diabolo filtration 



The barcodes in Figure 3 should be read as follows: In the case of the /3o 
barcode, the vertex (0) is a connected component which is born at level of the 
filtration and lives forever; on the contrary, for example, the vertex (3) is born 
at level 2 of the nitrations and dies entering the level 4 when it is merged with 
another connected component. In the case of the /3i barcode (where holes are 
the 1-homology classes), a hole appears at level 1 of the filtration between the 
edges (0, 1), (0, 2), (1, 2) and dies entering the last level of the filtration when it 
is filled with the triangle (0, 1, 2); on the other hand, the hole which appears at 
level 3 of the filtration never dies. 



3 An abstract formalization using Coq/SSReflect 



Let us now introduce an abstract formalization of the notions presented in the 
previous section using COQ together with the SSReflect extension. 



3.1 Simplicial Complexes and Homology 

In previous work, see [28,27], we have formalized the notions presented in sub- 
sections 2.1 and 2.2. However, for the sake of clarity of the exposition we include 
the main definitions and results which have been developed previously. 

We begin with the notions related to simplicial complexes. The set of vertices 
is represented by a finite type V. A simplex is defined as a finite set of vertices. 
Using this, the definition of a simplicial complex as a set of simplices closed 
under inclusion is straightforward: 

Variable V : finType. 
Definition simplex := {set V}. 

Definition simplicial_complex (c : {set simplex}) := 

forall x, x \in c -> forall y : simplex, y \subset x -> y \in c. 

The definition of the n-th incidence matrix of a simplicial complex, which 
is called incidence_mx_n, takes two arguments: a set of simplices c and the 
dimension n, and returns a SSReflect matrix. Moreover, we have proved the 
nilpotency condition (see Definition 3) for two consecutive incidence matrices 
encoded with incidence_mx_n. 

Theorem incidence_matrices_sc_product : 

forall (V:finType) (n:nat) (sc: {set (simplex V)}), 
simplicial_complex sc -> 

(incidence_mx_n sc n) *m (incidence_mx_n sc (n.+l)) = 0. 

The notion of homology is defined in COQ as follows. Let F be a field, 
V1,V2,V3 vector spaces on F , and / : VI ->• V2, g : V2 -> V3 linear ap- 
plications such that g o f = 0; then, the Homology of / and g is the quotient 
between the kernel of g and the image of /. This can be defined in COQ in the 
following way. 

Variables (F : fieldType) (VI V2 V3 : vectType F) 

(f : linearApp VI V2) (g : linearApp V2 V3) . 
Definition Homology := ((lker g) :\: (limg f)). 
Definition Betti := \dim Homology. 

Finally, this definition of homology can be instantiated for the homology in 
degree n of a simplicial complex sc using the linear applications associated with 
the incidence matrices in dimension n+1 and n (given a matrix M, linearApp(M) 
builds the linear application associated with M). 

Definition Homology_sc_n (sc : {set simplex}) (n : nat) := 
Homology (LinearApp (incidence_mx_n c n.+l)) 
(LinearApp (incidence_mx_n c n) ) . 

Analogously, we can define the n-th Betti number of a simplicial complex sc 
instantiating the Betti definition. 

Definition Betti_sc_n (sc : {set simplex}) (n : nat) := 
Betti (LinearApp (incidence_mx_n c n.+l)) 
(LinearApp (incidence_mx_n c n)). 



3.2 Persistent homology 

In this section, we formalize the results presented in Subsection 2.3. First of all, 
we define a more generic notion than the one of the persistent homology group 
H 3 r i P associated with a filtration. Such a notion involves the elements presented 
in the following diagram. 

^3 

g 

f 

v 2 

where Vx, V%, V3 and V4 are vector spaces over a field F, f : Vx V% and 
g : V3 — ► V4 are linear applications and i : Vx — > V3 is an injective linear 
application. Using this we can define the vector space Pf, g ,i as the quotient 
i(ker(f)) / (im(g) n i(ker(/))). We use the vector library of SSReflect [21] in 
order to define this notion. 

Variable (F : f ieldType) . 

Variables (VI V2 V3 V4 : vectType F) 

(f : linearApp VI V2) (g : linearApp V3 V4) 
(i : linearApp VI V4) . 

Hypothesis (i_inj : injective i) . 

Definition P_fgi := 

(i @: (lker f)) :\: ((limg g) :&: (i <§ : (lker f))). 

As i is an injective linear application and (im(g) fu(ker(/))) is a subspace of 
i(ker(/)), the dimension of -P/, g ,i is equal to the dimension of ker(/) (which in 
turn is equal to the dimension of V\ minus the dimension of im(f)) minus the 
dimension of im(g) ni(ker(/)). This definition and its correctness are introduced 
in COQ as follows: 

Definition PBetti := 

vdim VI - \dim (limg f) - \dim ((limg g) :&: (i @: (lker f))). 

Lemma PBettiE : \dim P_fgi = PBetti. 

We omit the formal correctness proof for readability, we refer the interested 
reader to look at the actual formalization [26]. 

Let us now present how we instantiate these definitions for the persistent 
homology group H^ p associated with a filtration. A filtration is defined as a 
sequence of sets of simplices satisfying both that every element of the sequence 



is a simplicial complex and that the elements of the sequence are sorted using 
the subset relationship. 

Definition filtration (f : seq {set simplex V}) := 
(forall x, x \in f -> simplicial_complex x) /\ 
(f orall i j , i <= j -> i < size f -> j < size f -> 
(nth setO f i) \subset (nth setO f j)) 

In order to define the inclusion matrix i£ p of a filtration, we first introduce 
the more generic notion of inclusion matrix of two finite sets of simplices. 

Representing a matrix requires an indexing of the simplices in Left (for the 
rows) and Top (for the columns). Since Left and Top are finite sets, they are 
equipped with a canonical enumeration: (emim_val Left i) returns the i-th 
clement of the set Left. A coefficient a%j of the inclusion matrix will be 1 if the 
i-th simplex of Left is equal to the j-th simplex of Top and otherwise. 

Therefore, we can define the inclusion matrix of two finite sets of simplices 
as follows. 

Variables Left Top : {set simplex}. 

Definition inclusionMatrix := 

\matrix_(i < #|Left|, j < # I Top I) 

if (enum_val i == enum_val j) then 1 else 0:'F_2. 

In the definition above, it can be noted that the first argument of enum_val 
is implicit since it is determined by the context. Indeed, the notation i < #| 
Left I means that the type of i is ' I_(# I Left I ) , that is i is an ordinal ranging 
from to # I Left I -1, where # I X I denotes the cardinality of the set X. With this 
type information, the system expands emim_val i to enum_val Left i, thus 
resolving the ambiguity (and similarly for j). 

The type annotation 0: 'F_2 indicates that the and 1 appearing as coeffi- 
cients of the matrix are the two elements of F_2, that is, Z2 as a field. We now 
define the inclusion matrix i\f of a filtration f by instantiating Left and Top to 
the set of n-simplices of the j and, respectively, p component of f . 

Variable f : (seq {set (simplex V)}). 
Variables n j p : nat. 

Definition n_simplices (c : {set (simplex V)}):= 

[set x \in c I #|x|==n.+l]. 
Definition n_k_simplices (k : nat) := 

n_simplices (nth setO f k) n. 
Definition inclusion_matrix_n_j_p := 

inclusionMatrix (n_k_simplices j) (n_k_simplices p) . 

The main result that we have proved about inclusion_matrix_n_j_pis that 
the linear application associated with it is injective. 

Lemma injective_LinearApp_inclusion_mx : 

injective (LinearApp (inclusion_matrix_n_j_p) ) . 



Now we have all of the necessary ingredients to define the persistent homol- 
ogy group H^ p and the persistent Betti number f3^ p just instantiating P_fgi 
and PBetti with the linear applications associated with the matrices d J n , 
and v\f '. These matrices are encoded as (incidence_mx_n (nth setO f j) n) 
, (incidence_mx_n (nth setO f p) n.+l) and (inclusion_mx f n j p) re- 
spectively. 

Variable (V:finType) (f : (seq {set (simplex V)})) (n j p:nat). 
Hypothesis f _is_f iltration : filtration f . 
Hypothesis j_is_in_f iltration : j < size f. 
Hypothesis j_leq_p_is_in_f iltration : j <= p < size f . 

Definition p_persistent_n_homology_K_j := 
P_fgi (LinearApp (incidence_mx_n (nth setO f j) n))) 

(LinearApp (incidence_mx_n (nth setO f p) n.+l)) 
(LinearApp (inclusion_mx f n j p)). 

Definition p_persistent_n_betti_K_j := 
PBetti (LinearApp (incidence_mx_n (nth setO f j) n)) 

(LinearApp (incidence_mx_n (nth setO f p) n.+l)) 
(LinearApp (inclusion_mx f n j p)). 

Finally, we can define (see Formula 2) and prove our version of the 
Fundamental Lemma of Persistent Homology (Theorem 8) . 

Theorem f undamental_lemma_persistent_homology (k 1 : nat) : 

\sum_(0<=j<k.+l) (\sum_(l.+l <= p < (size f).+l) (mu n j p)) = 
(p_persistent_n_betti_K_j f n j p) - 
(p_persistent_n_betti_K_j f n j (size f)). 

The bigop library of SSRcflcct, sec [5], has played a key role in the proof 
of the above theorem. This library is devoted to generic indexed big operations, 

n 

like Yl /(*) or D /(*)) an( i their properties. Again, the interested reader can 

i=0 iei 

consult the whole development of the formal proof of the Fundamental Lemma 
of Persistent Homology in [26] . 

4 An effective certified implementation 

One of the goals of this work was the development of certified programs to 
compute both Betti and persistent Betti numbers. In the previous section we 
have provided the definitions of such notions given in terms of linear applications 
of vector spaces. However, we do not usually work with linear applications when 
computing Betti and persistent Betti numbers but with the matrices representing 
those linear applications. 

Equation 1 provides the explicit formula to compute Betti numbers from two 
matrices. So we can use it to define this new notion using SSReflect matrices 



(where ' M [K] _ (m , n) is a to x n matrix over K) and prove that this new notion 
is equivalent to the one given by the Betti definition. 

Definition Betti_rank (mxf : 'M [K] _ (vl , v2) ) (mxg: 'M [K] _ (v2, v3) ) := 
v2 - \rank mxg - \rank mxf . 

Lemma Betti_rankE (vl v2 v3 : nat) (mxf : 'M [K] _ (vl , v2) ) 
(mxg: 'M[K] _(v2,v3) ) , mxf *m mxg = -> 

dim_homology mxf mxg = Betti (LinearApp mxf) (LinearApp mxg) . 

Similarly, we can define persistent Betti numbers in terms of matrices and 
prove the equivalence between such a definition and the one given in PBetti 
definition. 

Definition PBetti_rank (mxf : 'M [K] _ (vl , v2) ) (mxg : ' M [K] _ ( v3 , v4) ) 
(mxi: 'M[K]_(vl,v4)) := 
(vl - \rank mxf - (\rank mxg + (vl - \rank mxf) - 
\rank (col_mx mxg (kermx mxf) *m mxi)))°/ n N. 

Lemma PBetti_rankE (vl v2 v3 v4 :nat) : forall (mxf : 'M [K] _(vl , v2) ) 
(mxg: 'M[K]_(v3,v4)) (mxi: 'M[K] _(vl , v4) ) , 
injective (LinearApp mxi) -> 
PBetti_rank = 

PBetti (LinearApp mxf) (LinearApp mxg) (LinearApp mxi) . 

However the use of SSReflect libraries may trigger heavy computations 
during deduction steps, that would not terminate within a reasonable amount 
of time. To handle this issue some definitions, like matrices, are locked in a way 
that do not allow direct computations. 

To overcome this pitfall, we use the methodology presented in [15] whose 
key idea is the one of refinements. Roughly speaking, the correctness of math- 
ematical algorithms are proved using all the high-level theory available in the 
SSReflect libraries and then the algorithms are refined to an implementation 
on simpler data structures that will be the ones running on the machine. In our 
particular case of matrices we use lists of lists as the low level data type for 
representing them. 

The methodology presented in [15] have been implemented as a new library, 
built on top of SSReflect libraries, which is called CoqEAL [14]. This library 
includes the refinements of almost all the algorithms involved in the computation 
of Betti and persistent Betti numbers. To be more concrete, the only algorithm 
which has been necessary to refine is the one in charge of computing the row 
kernel of a matrix. 

The kermx function is already available in the SSReflect library and im- 
plements the row kernel of a matrix. This algorithm has been refined into an 
efficient version, called ker, which works with abstract matrices. The equivalence 
between both algorithms has been proved in the following lemma. 

Lemma eqmx_ker m n (M : ' M[K]_(m,n)) : ker M :=: kermx M. 



where the notation A : = : B means that A and B are equivalent matrices. Sub- 
sequently, the ker algorithm has been translated to the low level data type, 
list of lists, with the function ker_seqmx and, eventually, we have ensure that 
ker_seqmx perform the same operation that its high-level counterpart ker. 

Now, we have all the necessary programs to define an executable version of 
both Betti and persistent Betti numbers and prove the equivalence with their 
high-level versions. 

Definition ex_Betti_rank (vl v2 v3:nat) (mxf mxg: seqmatrix K) := 
v2 - (rank_elim_seqmx v2 v3 mxg) - (rank_elim_seqmx vl v2 mxf) . 

Definition ex_PBetti_rank (vl v2 v3 v4 : nat) 
(mxf mxg mxi : seqmatrix K) := 
let rf := rank_elim_seqmx vl v2 mxf in 
let rg := rank_elim_seqmx v3 v4 mxg in 
vl - rf - (rg + (vl - rf) - 

rank_elim_seqmx (v3 + size (ker_seqmx vl v2 mxf)) v4 

(col_seqmx mxg (mulseqmx (ker_seqmx vl v2 mxf) mxi))). 

Lemma ex_Betti_rankE: forall (vl v2 v3 : nat) (mxf: 'M [K] _(vl , v2) ) 
(mxg : 'M[K]_(v2,v3)) , 

ex_Betti_rank vl v2 v3 (seqmx_of_mx K mxf) (seqmx_of_mx K mxg) 
= Betti_rank mxf mxg. 

Lemma ex_PBetti_rank_PBetti_rank_E : forall (vl v2 v3 v4 :nat) 
(mxf: 'M[K]_(vl,v2)) (mxg : 'M[K] _(v3,v4)) (mxi : 'M[K]_(vl,v4)) , 
ex_PBetti_rank vl v2 v3 v4 (seqmx_of_mx K mxf) (seqmx_of_mx K 
mxg) (seqmx_of_mx K mxi) = PBetti_rank mxf mxg mxi. 



It is worth noting that the executable functions on matrices, represented as lists 
of lists, usually need the size of the matrices, as can be seen for instance in the 
rank_elim_seqmx function. Moreover, the function seqmx_of _mx is the one in 
charge of transforming abstract matrices into lists of lists (which are encoded as 
the type seqmatrix). 

Following the same pattern, we have defined executable simplicial complexes 
and their connection with the computation of Betti and persistent Betti numbers. 
In particular, the ex_Betti_sc function takes as argument a simplicial complex 
c and a natural number n and computes the n-th Betti number of c and the 
ex_p_persistent_n_Betti_K_j function takes as argument a filtration f and 
three natural numbers p , n , j and computes the p persistent n-th Betti number 
of the j level of the filtration f . Some examples of the usage of these functions 
are introduced in the following section. 



5 Experimental results 



In this section we try to clarify how Betti and persistent Betti numbers can be 
computed within COQ. Let us start with the computation of Betti numbers of 
the simplicial complex of Figure 1. 

Simplicial complexes are built in COQ providing their facets. A facet of a 
simplicial complex /C is a maximal simplex with respect to the subset order 
C among the simplices of /C. To construct the simplicial complex associated 
with a sequence of facets, J 7 , we generate all the faces of the simplices of T\ 
subsequently, if we perform the set union of all the faces we obtain the simplicial 
complex associated with J 7 . This procedure have been implemented, and its 
correctness have been proved, using COQ in [28]. In the case of the diabolo 
complex of Figure 1 its facets are: {(2, 3), (3, 4), (3, 5), (4, 5), (0, 1, 2)}. 

The procedure to compute Betti numbers of the diabolo complex is as follows. 
First, we define the list of facets: 

Definition diabolo_f acets := 

[: : [: :2;3] ; [: :3;4] ; [: :3;5] ; [: :4;5] ; [: :0;1;2]] . 

and, subsequently, we compute Betti numbers (of dimension and 1) using the 
instruction ex_Betti_sc which takes as arguments the facets of the simplicial 
complex and the dimension. 

Eval vm_ compute in ex_Betti_sc diabolo_f acets 0. 
Eval vm_ compute in ex_Betti_sc diabolo_f acets 1. 

obtaining in both cases 1 (this means that the diabolo complex has a connected 
component and a hole) in just milliseconds. 

The procedure to compute persistent Betti numbers is quite similar. First 
of all, we define the filtration providing the facets of each one of the simplicial 
complexes of the filtration. In the case of the diabolo filtration of Figure 2, the 
representation in COQ of the filtration is the following one: 

Definition diabolo_f iltration := 



[: :0] 

0;1] 

0;1] 

0;1] 

0;1] 

2;3] 



[ 



:1];[: 
1;2];[ 
1;2];[ 
1;2] ; [ 
1;2];[ 
3;4] ; [ 



2]] ; 
0;2]] ; 
0;2] ; [ 
0;2] ; [ 
0;2] ; [ 
3;5] ; [ 



3];[::4];[::5]]; 

3;4] ; [: :4;5] ; [: :3;5]] ; 

3;4] ; [: :4;5] ; [: :3;5] ; [: :2;3]] 

4;5] ; [: :0;1;2]]] . 



Using this, we can compute persistent Betti numbers by calling the func- 
tion ex_p_persistent_n_betti_K_j. Therefore, we can combine the functions 
to compute Betti and persistent Betti numbers in order to obtain information 
about the filtration. For instance, if we want to know how many connected 
components which live at the level of the filtration (this is computed by (nth 
nil diabolo_f iltration 0)) are still alive at level 4, we use the following 
instructions: 



Eval vm_compute in ex_Betti_sc (nth nil diabolo_f iltration 0) 0. 
Eval vm_ compute in 

ex_p_persistent_n_Betti_K_j diabolo_f iltration 4. 

obtaining as result 3 and 1 respectively. This means that there are three con- 
nected components at level of the filtration but just one of them is still alive 
at level 4. 

As a benchmark to test the efficiency of COQ programs, we have considered 
several random simplicial complexes and filtrations generated from a fixed num- 
ber of triangles. The results can be seen in Table 1 where we show for each 
number of triangles the times (in seconds) to compute both Betti and persistent 
Betti numbers. 
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2 
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25 


190 


2731 



Table 1. Execution times 



Of course, our COQ programs take much more time to compute Betti and per- 
sistent Betti numbers than special purpose software packages such as Chomp [1] 
and the GAP homology package [13] for Betti numbers or JavaPlex [38] and 
Dionysus [34] for persistent Betti numbers. However, it is worth remarking that 
COQ is an Interative Theorem Prover and in this kind of systems, unlike Com- 
puter Algebra systems or special purpose packages, efficient computational ca- 
pabilities have not been the main goal up to now. 

Nevertheless, things are changing and there is an on-going effort in the im- 
plementation of efficient mathematical algorithms running inside COQ. In this 
line, we can highlight the works on machine integers and arrays [4] , efficient real 
numbers [23] or an approach which consists in internally compiling COQ terms 
to the functional programming language OCaml [33]. 

6 Conclusions and further work 

In this paper we have presented a set of formally verified programs which allows 
us to effectively compute persistent Betti numbers within COQ. To carry out 
this task, it has been necessary a formalization of the basic notions related to 
persistent homology. Moreover, we have formalized relevant theorems like the 
Fundamental Lemma of Persistent Homology. This illustrates that Interactive 
Theorem Provers are mature enough to tackle the formalization of theories in 
non-trivial mathematical domains. This fact can be seen also in the proof of the 
Four Color Theorem [20], in the Flyspeck project [25], devoted to the formal 
proof of the Kepler conjecture [24] ; or in the classification of finite groups [2] . 



One of our main concerns for the future is associated with the formalization 
of efficient mathematical algorithms. This is a necessary effort which has to be 
carried out before undertaking other of our goals: the application of our programs 
to biomedical problems. 

Homological techniques have been successfully applied in the biomedical con- 
text, see [36,35,27]. In this environment it is necessary to have both efficient and 
reliable software systems; therefore the use of formally verified efficient algo- 
rithms seems desirable. 

It is also appealing to use this work as a basis for further developments. We 
can tackle the formalization of different extensions of persistent homology; for 
instance, multidimensional persistence [7] or ZigZag persistence [6]. 

In summary we can conclude that we are working towards an efficient formal 
library of Computational Algebraic Topology, in this line of work we can mention 
the formalizations of Effective Homology [3,12] and Discrete Morse Theory [29], 
however more work is still necessary to reach our goal. 
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